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ABSTRACT 


This thesis introduces exploratory data analysis methods 
into the question of categorizing pilots’ and relating these 
categories to accident potential. The usually recorded 
flight data deals with the pilots! total flight experience, 
recency, and frequency of flying. The purpose of categorizing 
is to determine if the recorded flight data could help dis- 
criminate between two original sample groups of fifty pilots 
each, those pilotis with accidents during FY/3 and those 
without. 

The technique of linear discriminant analysis indicated 
that there is a significant difference in the mean vecters 
of flight data for the two groups. The computed discriminant 
funetion produced an empirical correct classification rate 
of 81%. Techniques of cluster analysis (with the aid of 
Bete pall components analysis) are also employed to detect 
patterns or differences in the data. Curiously, the amount 
of time flown in the last 48 hours is associated with rela- 
tively low accident potential, whereas time flown in the 
last 24 hours seems to be correlated with a higher accident 


potential. 
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I. INTRODUCTION AND OBJECTIVES 


Aviation safety in the United States Navy has always 
received considerable attention. With the rapidly increas- 
ing costs of naval aircraft and the increasing costs of 
training naval aviators, it is imperative that every possi- 
ble aspect of aviation safety be thoroughly investigated. 
It is important to search all paths which may yield any 
information at all having a bearing on aircraft accident 
causation or prevention. 

Over the last five years, approximately fifty per cent 
of the major and minor aircraft accidents in the Navy have 
included pilot error as either the primary factor involved 
or as a contributing factor to the cause of the accident. 

There have been many reasons purported as to the causes 
of pilot error accidents, ranging anywhere from plain lack 
ef physica coordination Go mental Pneompetience. © Av genierall 
term which relates to both physical and mental abilities is 
experience. That is, as flying experience increases, the 
learning process should increase both of these abilities. 
Another general term which affects these two abilities is 
proficiency. That is, recency and frequency of flying 
Should also have a direct bearing on these abilities. 

This thesis explores methods for classifying or cate- 
gorizing pilots according to variables associated with their 


experience and proficiency. Accident records are uSed to 





determine if there is any relation between the classifi- 
Col UeMswandetnesOccUrrence Of accidents. 

Onieswoulmcr: Mike eto knowPit by tnvestigating ‘a pilots 
experience and proficiency data whether or not he shows a 
high or low accident potential. Of spécific intérest is the 
question of whether pilot error accidents are related to 
lack of total flying experience, lack of experience in type 
Simo cla timo ACKmOs Dracurecs GUe TO Insutticlent seurrent 
flying. If an individual were classified as having a high 
degree of accident potential, then corrective action could 
be taken to reduce this potential. 

Only the pilots of Navy fixed-wing aircraft are studied. 
Marine and/or helicopter pilots are not included. The study 
encompasses those accidents that occurred during fiscal year 
1973. Unfortunately, the data base contains the records of 
only fifty aviators who have been involved in pilot error 
accidents. Fifty other pilots were selected as a control. 
Even with these small numbers, a result appeared that may be 
worth pursuing Mmirthery. ss Hecency of flying may be overdone. 
Mie amount of time flown in thetlast 48 hours is) positively 
Correlated wilth low’ accident potential, but ‘a reversal Seems 
to take place when looking at the time flown in the last 


24 hours. 








II. FACTORS INFLUENCING EXPERIENCE AND PROFICIENCY 


There are many factors affecting experience and profi- 
ciency. Situations encountered, crises faced, types of 
missions flown, and many other qualitative factors have a 
definite bearing. However, the only factors considered here 
are quantitative variables which can be obtained from acci- 
dent records and IFARS (Individual Flight Activity Reporting 
System) pilot records.: 

The Naval Safety Center at Norfolk, Virginia maintains 
records of all accidents in which Naval aircraft are involved. 
The recorded data items which reflect a pilot's total 
experience are the following: 


Number of years designated a naval aviator 
Lomo) shly ing sneurs 


Total flying hours in the model aircraft in which 
the accident occurred 


Total day carrier landings 
Tom en chive carrier. landings 


The data items which reflect his proficiency (i.e. his 
recency and frequency of flying) are the following: 


LimMewa li SctLleSs=this@ailrecrate in=last90 day's 
Pine Gaasemodel this®aircratt in wast 90 days 
Elapsed time since last previous flight 

Time flown in the last 24 hours 

Time flown in the last 48 hours 

Number of missions flown in the last 24 hours 
Number of missions flown in the last 48 hours 


Number day carrier landings in last 30 days 





Number night carrier landings in last 30 days 
Instrument trainer time in last 90 days 


Weapons system trainer time in last 90 days 
The Individual Flight Activity Reporting System (IFARS), 
apart of the Naval Safety Center, maintains flight records 
on all naval aviators by fiscal year. The only data items 
pertaining to pilot experience which are retrievable from 
computer access for all fiscal years are: 


Number of years designated a naval aviator 
LOG. tel Vln oe nouns 


At present, these following additional experience items are 
retrievable by computer only from the beginning of fiscal 
year 1969 and thus cannot be used as comparison variables 
Since many of the aviators in both sample groups began 
five prlLor te 8969. 


Total time by model 

Day and night carrier landings by model 
Other type landings by model 

Instrument time by model 


A new compilation is now in progress by the IFARS sec- 
tion at the Naval Safety Center to record ail flights on 
SConpurcrariics for all fiscal years for valli pliots Sof that 
future studies can be more encompassing. 

The proficiency indicator data items for those pilots 
in the accident group have a natural base point from which 
to be measured. That is, an item such as "time flown in the 
last 48 hours" means the last 48 hours directly prior to the 


Scceloemrman which the pilot was involved. However, for 





the non-accident (control) group, there is no such tee 
point from which to measure. Thus, comparison of proficiency 
data items becomes rather nebulous. 

One reasonable way to give significant meaning to the 
term proficiency is to artificially construct similar data 
items by an averaging procedure. For example, prior to each 
flight (for the period in question) compute the time flown 
in the preceding 48 hours. Do this for every flight during 
the fiscal year and then obtain an average time flown in 
the preceding 48 hours. The necessary data can be obtained 
from a detailed flight listing for the pilots in the control 
group for FY73. This procedure can be utilized for the 
following data items: 


Time all series this aircraft last 90 days 
Elapsed time since last previous flight 

Time flown in the last 24 hours 

Time flown in the last 48 hours 

Number of missions flown in the last 24 hours 
Number of missions flown in the last 48 hours 
Number day carrier landings in last 30 days 
Number night carrier landings in last 30 days 


With these artificially constructed data items: one can in- 
clude proficiency in the comparison between the control 

group and the accident group. The appropriateness of doing 
this can be determined by comparing the results of statisti- 
cal analyses performed with and without these added variables. 
If these added variables give a better delineation between 


BeOouvpo mugen iy ts appropriate to include then. 
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To recap, the variables which are common to both groups 
and which are used for the analysis are: 


(X, ) Number of years designated a naval aviator 
(X,) Total flying hours 

(X3) Time all series this aircraft last 90 days 
(X),) Time since last previous flight 

(X,) Time flown in the last 24 hours 

(X¢) Time flown in the last 48 hours 

Ue) Number of missions flown in the last 24 hours 


(Xp) Number of missions flown in the last 48 hours 
(Xy) Number of day carrier landings in last 30 days 
(X, g) Number of night carrier landings in last 30 days 


lial 





Pee oe LON, OFMGROUPS 


The accident group was composed of all those pilots 
POO wereceinyoOlved = in gollov error aceidents during fiscal 
year 1973. This group comprised 66 different pilots; no 
pilot had more than one accident attributable to pilot error. 
Due to incomplete data in two cases, this was reduced to 
64 pilots. 

The control group was more difficult to establish since 
there were several thousand aviators from which to choose. 
A subset of these pilots was obtained that satisfied two 
criteria: (1) it appeared to be a sample representative of 
all naval aviators, and (2) the data was relatively easy to 
Soto leet nce Sanple @baxen wase the first 100 aviators on the 
IFARS files. Since the IFARS files are ordered by increasing 
social security number and the increments between successive 
numbers was very large, examination of the biographical data 
leads us to believe that social security numbers had no 
Dearinpeupon age le lensth Oetimesingaviation duties, or even 
length of time in the Naval Service. There was no obvious 
reason to think that the sample was unrepresentative. 

BeOmmrne LOOsbtlotswanitially assigned to the control 
group, 20 were helicopter pilots and 15 were Naval Flight 
Officers, thus leaving 65 subjects in the control group. 
Simecmune #1 ze Of the two groups under study is arbitrary, 


ameiuremer reduction in the size of each group was made to 


IL 





meet a computational constraint which was imposed by a 

computer program employed in the actual analysis. Because 

of the extensive computational effort required in the 
analytical techniques used, the use of a digital computer 

was mandatory. One of the computer programs used for the 
analysis had a limitation of 100 data units. Therefore, a 
random selection of 50 subjects was chosen for each of the 

two groups under study. (The random selection was accomplished 


in the manner of drawing numbers out of a hat.) 


IRs 


- 





IV. INVESTIGATIVE APPROACHES 


The data describing the subjects is composed of ten 
pieces of information for each subject. This constitutes 
a multivariate data set. Therefore, some sort of multi- 
variate statistical technique is appropriate. Which sta- 
tistical techniques to employ depends upon the information 
desired to be obtained from the analysis, and is the primary 
Pole Chimera cad Smscic tl On. 

As stated in the introduction, one of the primary objec- 
tives is to establish a classification scheme and then to 
determine if this classification is related to the occurrence 
cf accicents. One statistical procedure which treats this 
problem si that of discriminant analysis. Discriminant 
analysis is a multivariate statistical technique used for 
constructing decision rules by which data units (subjects, 
or pilots in the present context) can be classified as 
members of one group or another. | The goal is to assign 
subjects to the groups to which they have the greatest 
resemblance based upon a profile of their characteristics, 
while at the same time to minimize the effects of misclassi- 


Pica Tone 


i anderberg, Venere LUSter Analysis for Applications, 
Peele cademile fress, Ine, , 19/3 


“risenbeis, Pe ead enaveryemnhor. . Discriminant Analysis 
and Glleissiification Procedures, p. 3, Lexington Books, 19/e 
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The procedure constructs a discriminant function based 
upon input data in which subjects are members of known groups. 
This discriminant function is uSually linear but can be qua- 
dratic or have other forms. The data are used to make the 
function specific (determine the parameters). Typically, 
it is then used to reassign the original subjectS to one of 
the two groups on the basis of their characteristics in order 
to make an empirical determination of the rate of misclassi- 
fication. If all subjects are reassigned to the group from 
Which they initially came, then there is zero percentage 
misclassification and perfect discrimination between groups. 
The discriminant function can also be uSed to categorize 
other observations (subjects), whose group membership is 
unknown, on the basis of their attributes. 

If several (more than two) groups are present, then a 
set of discriminant functions is constructed to assign 
observations to the appropriate groups. 

A linear discriminant function will be constructed for 
the two pilot groups on the basis of thelr experience and 
proficiency characteristics. If the function discriminates 
Well, then one can determine what particular characteristics 
have the strongest influence on placing a subject in the 
accident group. > Also, by applying the discriminant function 
to subjects not in the original test groups one can determine 


pie LieaceLOenie povential. 


3press, oe Oplicd Multivariate Analysis, Dp. 376-379, 
Holt, Rinehart and Winston, Inc., 1972 


i 





The assumptions upon which discriminant analysis is 
based and the actual mathematics will be covered in the 
Me Xibees C Cig. OM. 

If the discriminant function fails to separate the groups 
without a high rate of misclassification, the lack of success 
can be attributed to one of two causes. The first is that 
the variables characterizing the subjects do not distinguish 
between the groups to a strong enough degree or the groups 
Overlap too much in the given measurement space. The second 
is that the groups cannot be separated by a function of the 
form chosen for the analysis. That 18, maybe instead of a 
linear discriminant function we should have a quadratic or 
more complex one. 

To illustrate the preceding concept, let the accident 
group be denoted by "A" and the control group by "C". Now, 
if one considers the groups in two dimensions only (instead 
of the actual ten) the groups might be clumped as in Figure 


Cl. 





Figure (1) 
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In this case a linear discriminant function would serve to 
separate the groups well and it is not necessary to construct 
a quadratic function. If, however, the data ppeared as in 
Figure (2), then one can see that a linear discriminant 
function cannot discriminate among the groups without error. 
However, a quadratic form of discriminant function such as 
the curve depicted might very well have excellent discrimi- 


nating capabilities. 





Figure (2) 


Voew ie Neat sClocirn] Nam catuner tome 1 SE a Too lethate s 
immediately available in terms of computer programs. It 
is based upon the assumption that the data came from 2 
multivariate normal population, and when this assumption is 
Metemeee wOrkSeas welimas any other discriminants function. 
Other discriminant functions are not readily available for 
usewm Also; the linear discriminant function could do a good 
job even if the multivariate normal assumption is not met, 
i.e. when the natural separation of groups is so great that 


even a simple method would do the job. 


eee 





For the problem at hand, the use of the linear discrimi- 
nant function was encouraging, but since the assumption of 
multivariate normality is not appropriate (e.g. rotation 
policies split variables X, and Xo so that their distribu- 
tions are multimodal) it was decided to explore the nature 
of the data to see if a better job could be done. 

Exploratory data analysis on 100 points in Euclidian 
10-space is not easy. Some form of cluster analysis is 
called for, that is, cluster the subjects into groups. 

This leads to the question of how many groups We actually 
have and how the data are maeuoede 

Cluster analysis is actually a collection of techniques 
that are used to group multidimensional entities according 
to various criteria of their degrees of homogeneity or 


2 


heterogeneity. For example, in this problem grouping will 
be on the basis of the values of each variable which des- 
cribes the pilot's flight experience and proficiency. Pilots 
with high total flight time might tend to cluster into one 
group while pilots with few carrier landings or with little 
Dimemoanecelastwuitlight might tend Go cluster into other 
proups. How close should the values of the variables be 


before subjects are grouped into the same cluster is the 


question of the degree of homogeneity desired, and how many 


"Op. Cit., Press, S.J., p. 408-411 
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clusters there should be is the question of the degree of 
heterogeneity desired. This type of grouping is called 
grouping by subjects; that is, the entities are subjects. 
The entities can also be the variables themselves, in which 
case the clustering is said to be by attributes. 

There are several pertinent questions to bear in mind 
when performing a cluster analysis. How many clusters are 
inherent in the data? Since attributes may be measured in 
different units, should the attributes be standardized 
before they are clustered? How large should the errors be 
before they are considered intolerable? There will be one 
type of error made by not assigning similar entities into 
the same group, and another type of error made by grouping 
dissimilar entities into the same cluster. Should all 
possible pairs of points (or attributes) be scrutinized for 
erilarieres 2° Not all of these questions have definite 
answers, but they will be addressed in the next section. 

In most other statistical techniques, such as analysis 
of variance, the variables usually possess some structure 
Oo eee kone incmpe pare cular populations aepriori. Consequently, 
it is often possible to assume particular distributions for 
the populations and make associated inferences. In clustering 


problems, however, the principal concern is how to establish 


ibid. 
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appropriate populations. Thus, clustering analysis logically 
precedes the application of most other multivariate proce- 
dures when the data do not possess structured form. | 

There are two possible approaches to clustering. These 
are cnuncmaulve sprocedures and non-cnumerative. Enumerative 
means Simply to list all the possible groupings of subjects 
(attributes if the clustering is by this form of entities). 
The number of possible groupings is represented by a Stirling 
Number of the Second Kind. For example, in clustering 
twenty-five subjects into five groups there are between two 
and three quadrillion possibilities from which to choose 
the best Seomp inet. This is not feasible even with a compu- 
ter, especially when the problem is much larger than this. 
Some feasible non-enumerative techniques are described in 
moc enc Xs CCuLOMn 

If through the use of cluster analysis one can find a 
feasible set of groupings that have meaning to this problem 
then the groupings can be ee aoe a discriminant analysis 
to obtain the desired classification procedure. 

Clustering by maT aD lee can also prove to be worthwhile 


in that it can help to determine if some of the variables 


are redundant and not providing any additional information. 


"Ibid. 


Bop. mrt emenaerbers?, MIR. , p..23 
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If so, those redundant variables can be eliminated or com— 
bined, thus simplifying the required computations. A 
method of combining variables which was utilized was that 
of principal components analysis. 

Multivariate analysis by the principal components 
model attempts to reduce the dimension ofthe problem while 
retaining as much information (i.e. variation) contained in 
the original data as possible. The method produces linear 
combinations of the original variables which maximize the 
variance of the resultant weighted sum. Thus attention is 
centered primarily on the variable with the greater varia- 
bility by the appropriate assignment of the weights. This 
iene eco moimaytOnweiethesvariables is called the @irst 
principal component and reduces our set of old variables to 
one variable. If it is desired to extract more variance 
from the data, one can construct a second principal component 
Woteh mS Orthogonal to the first. The process! can be® repeated 
until there are as many components as original variables, 
and thus have extracted one-hundred percent of the total 
variance.” 

The objective of principal components = pealsgBats us) noc 
merely ne reduce! Che@sitgesandscomplexity of tne problem, 


Dutmalse tO flean information from the data which might not 


Jop. Cit., Press, S.J., p. 283-285 


oi 





Otherwise be’ obvious. sSpecifically, in the problem under 
study here, the fifth, sixth, seventh and eighth variables 
listed on page eleven can be regarded as prime indicators 
of frequency of flying. However, when the data is analyzed 
(by cluster analysis) the exact effects of these variables 
might not readily be apparent. When all these variables 
are combined into one variable (i.e. the first principal 
component) the effect of frequency might be quite obvious. 
That is, it might be observed that frequency of flying has 
an inverse relationship with the occurrence of accidents. 
For this analysis, of those variables listed in page 
eleven, the first and second (years designated naval aviator 
and total hours flown) were combined to get a "total 
experience" variable; and the fifth, sixth, seventh and 
eighth (time flown in the last 24 hours, time flown in the 
last 48 hours, number missions flown in the last 24 hours, 
and number missions flown in the last 48 hours) were combined 


to get a "frequency" variable. 
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V. ANALYTICAL TECHNIQUES 


ThE oguenalyEle teoehnigue appliedito the two groups 
of pilots was that of discriminant analysis with the primary 
objective being to develop an accurate linear discriminant 
function. (Actually, the purposes of discriminant analysis 
are first to determine if there is a difference among popu- 
lation means or equivalently if there are any overlaps among 
Piome COO oem onemmcc CONGMmemmLoOmcCONSUrIcCe ClassificatlOm=schemes 
based upon the descriptive variables.) 

There are three basic underlying assumptions of dis- 
criminant analysis. They are (1) that the groups being inves- 
tigated are discrete and identifiable, (2) that each observa- 
tion (subject) in each group can be deScribed by a set of 
measurements on m characteristics or variables, and (3) that 
hoo emmmac eo) | oS mamemaaS Unio daLO lavemo Mule lyarlage snormead. 
distribution in each population and equal covariance matrices 
Sone Opal Ofcom irStegwOreasSumplLOnss are Seen Go be 
satisfied as discussed in previous sections. The third assump- 
tion indicates the need for separate statistical tests to 
determine if the variables are multivariate normal and if the 
covariance matrices are equal. It has been mentioned that 
non-normal multivariate data does not necessarily bias the 
results of a discriminant analysis. Also, since no satis- 


factory tests exist for testing populations to be multivariate 
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Horna web ers deisdicult tO routinely test the! normality 
assumption. Finally, the central limit theorem suggests 
Paistves Seem IUnwe! or ObSeCr vations inercases, the discri— 
Minant values for each group approaches a normal distribu- 
tion, 29 
The assumption of equality of covariance matrices 
(le. equality of within group dispersions) appears to be 
more critical in biasing the results. Eisenbeis and Avery 
Suggest that linear classification rules are not adequate 
when unequal covariance matrices exist and that quadratic 
classification rules should be employea,. t+ 
The within group dispersion matrices for the two groups 
of data were computed and are shown in Table VI in Appendix 
C. The pooled within-groups dispersion matrix is also 
Shown. The group dispersion matrices were tested for equality 
by the procedure given in Appendix D. 
After satisfying the assumptions preparatory to the 


actual analysis one can first test the equality of group 


means. The null hypothesis is: 


10k ink, alee APGrilmeneal Destpn ee. rocedures! flor the 
Behavioral Sciences, p. 62, Brooks/Cole Publishing Co., 
1968 


lop, Citeee fisenbers sengA, and Avery, R.B., p. 16 
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where 


Wy = Oy ge Wy oe ee Hy 10)? 
and 


Ua = Og a2 Uo a2 +229 Uo aq): 


The following steps are used by the BIMEDO4M computer 


program to test for the equality of group means: 


Step (1) -- the means for each group are computed 
Ke = (Xs 49 X49: 2 XP ig? 2 = = Ise 
Step (2) -- the differences in group means are computed. 


| 
! 

ps | 
Nm 


(0a 7 Xoja2 sees X10 7 *2,10? 
Step (3) -- the matrices on and ee are computed where an 


element of ae is given by 


| 


"uv ff td 7 Key gy ~ OSES 


ale 





Step (4) -- the matrix A is computed 


Mm s- + s- 
where (ait, ale. “ai a 1%) is the (ue row of A 
Step (5) -- the Mahalanobis a S@avLStleya2Ssmeomputed 
Be = (n_ +n-2) ; : 1E m= x. § x a 
ee 2 1=1 j=1 1,1 eth TI 3 Cuore 
Step (6) -- the F statistic is computed 
nah 2th 2 a = De ~E-n,-m-1 
m(n, - nj)(n, - ny - 2) 1 2 


where n., and Nn, are the respective sizes of the two 


Ml 
groups and m is the number of variables. =* 


The null hypothesis can be rejected when the value of the 
test statistic is greater than the tabled value of F for 
the desired level of significance. 

The construction of the discriminant function is predi- 
cated upon minimizing the effects of misclassification and 
assigning subjects to the group to which they have the 


greatest resemblance. The effects of misclassification 


“*BMD Manual, Biomedical Computer Programs, Health 


Sciences Computing Facility, UCLA, University of California 
Bre S Sao eseuo. cl l=220 | 
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Gepend upon the a priori knowledge of group membership and 
the costs or penalties of middinssification. The BIMED 
programs assume no special a priori probabilities of group 
membership, i.e. the probability of belonging to either 
group (in the two-group case) is one-half. They also assume 
MmicmeOsuSmOn Misclassificamlon to bewequal aime. the cost 

of assigning an actual member of group number one to group 
number two is the same as assigning a member of group number 
two to group number one. 

The measure of resemblance is determined by the m char- 
acteristics which describe each subject. By substituting 
the values of the characteristics into each group's proba- 
bility density function it is determined how closely the 
Subject resembles the group as compared with the rest of the 
population. The BIMED programs yield the coefficients and 
constants for the linear discriminant function for each 
group in. the total population.?? 

iviworccrmro determineswhat weffech the chosen variables 
had on proficiency and experience it was desirable to mea- 
Sure the association among the variables. The association 
measure pnp loved was the product-moment correlation coeffi- 
client. The correlation computations and correlation matrix 


Homey hnementire date iset gdsegiven invAppendixir’. 


l3rpi4a. 
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The problem of how to group the variables given this 
association measure can be solved through the use of hier- 
archical clustering techniques. These techniques can also 
be uSeful to cluster by data units (subjects) which have a 
different association measure. 

For the association measure among data units, most 
investigators use Arie measures when the data units are 
described by interval variables. Metric measures must 
satisfy certain properties. I1fE is a given meaSurement 
Space and X, Y, and Z are points in E, then an association 


function Dis a metric measure if and only if it satisfies 


the following conditions: 14 
Gi D( X,Y) “=°0 ilie Bko\el cjalpeslir 8 Sv 
ee D(X) = 9 for all X and Y in E 
Coy DO we DULY ,X) 7 Potaerdl tl 8 ANeele st Sica 39 
Cees Vom D(X) Diy, 2) formalll X.Y and@Z ingE 


The most common metric measure is the Euclidian distance 


n 
e = 2 5 e 
funetion, Do (X; X,) = fr (x - Xey) 1*. ‘This is a special 


; ait 

i=l 
case of the general class of metrics called Minkowski metrics 
jP51/P 


2 


n 
which have the form D(X, ; X,) = OEE - X3y 


T : 
> = ; 
where p > 1 and x; (X95 Koga sees Xy4? 1S et nemeve c cor 


Loy, Cit., Anderberg, M.R., p. 98-102 
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of scores on the jas data unit. In this analysis the 
Euclidian distance function was used to cluster the data 
units. 

The hierarchical methods are used to construct a tree 
(dendrogram) depicting the relationship among the entities. 
The entities are grouped into clusters in order of their 
association measures or similarities. The ordering provides 
aealerarchy, thus the name. The similarities can be of many 
forms of association measures; the general term applied to 
the matrix being a similarity matrix. 

A breakdown of hierarchical methods yields agglomerative 
and non-agglomerative procedures. The agglomerative proce- 
dures start with the branches (each entity) and combine these 
entities until there is but one remaining cluster (the root). 
The alternative procedures work from the root backward. 

Only the former was used in this analysis. 

There are many actual techniques and criteria of hier- 
arehnical clustering. Initwally ners entity is considered to 
be a cluster of one. The first method searches the similarity 
matrix for the pair of entities with the highest degree of 
association (e.g. largest correlation among the variables) 
and groups these two entities. It then searches all remain- 


ing clusters and groups those two clusters which are closest, 


1 Ipid, 


an 





i.e. the correlation among their closest members is highest. 
This step is repeated until there is but one cluster remain- 
ing. This method is called the "Single linkage" method by 
Anderberg or the "connectedness" method by Tonn sone. The 
names derive from the fact that each cluster is joined by 
the single shortest or strongest link (thus most strongly 
connected) between them. 

The second procedure, called complete linkage, is the 
same as single linkage except that the association between 
groups is the association between their farthest members. 
Johnson calls this the diameter method because all entities 
in a cluster are linked to each other at some maximum distance 
(or diameter). 

Hierarchical clustering is usually not too enlightening 
for the clustering of data units. The non-hlerarchical 
methods are more appropriate for classifying the data units 
into a Single classification of k clusters. “The basic con- 
cept in most of the non-hierarchical methods is to begin 
Mitheaneinitlawe partition of the data units and adjust the 
EfUS ver Memb crse to Obvalngam Dest partition . 5 

The simplest and most common non—-hierarchical clustering 
ero cethine Tomunay Clmcenvrol.cgesoruingews Beginning with the 


imielalspartition of k elusters (dach usually consisting of 


tT onnson, So0.,  Hierarchagal Clustering Schemes" ,; 
Psychometrika, Vol. 32, No. 3, p. 24-254, 1967 
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one data unit) a new data unit is assigned to the cluster 
with the nearest centroid by some sort of distance measure. 
Centroids are recomputed after a data unit’ is assigned and 
the procedure repeated for remaining data units. After all 
data units are assigned, the entire procedure can be reapplied 
Cowall data units Over and ever until there are no more 
changes in cluster memberships, i.e. until eonverzence.-! 
There are more complex methods than the centroid methods 
for clustering data units and these are based on nie ere ce 
statistical analysis techniques. The scatter of two variables 
is the inner product of two centered score vectors. The 
scatter matrix T iS a square matrix that has the entry tay 
which is the scatter of variables i and j computed over all 
ric. Card ai pom a Cn Or Une almeluUswers Nas aps OWieseatter 
matrix Wy computed over the data units in the gan rae as 
The within groups scatter matrix is given by W= 2 Whee 
The between groups scatter matrix is denoted by oe 
element bs; = : MX $4eX4y imere mis the number of data 


k 
a 
Units im the eh oictes. Xs y is the mean (centered around 


the grand mean in the entire data set) of the , uh variable 


Top. cit., Anderberg, M.R., p. 156-173 
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in the hth cluster. The three scatter matrices can be 


Shown to satisfy the relation T = B + W. 18 
An important element in many clustering criteria is 
the determinantal equation |B - AW| = 0. The eigenvectors 
of the matrix wot provides the As SOlMTLOmsS Juemun LS equate ms 
D. J. McCrae has developed a FORTRAN IV computer program 
called K-MEANS which utilizes these concepts to cluster the 
data into k clusters. He provides for four possible criteria 
for determining when assignment of a data unit to a particu- 
lar cluster results in the "best" partition of the data set. 
These criteria are: (1) minimize the trace of W; (2) maximize 


the largest elgenvalue of wots: (3) maximize the trace of 


1B: and (4) minimize the ratio of the determinants |wi//|T|. 


Wr 

This last criterion is more commonly known as Wilk's Lambda 

Statistic. Since T is the same for all partitions, this is 

equivalent to minimizing det W. The last procedure was the 

one used to cluster the data units in this particular analysis. ~? 
MecCrae's K-MEANS also allows three choices of diatnce 

measures between clusters. These are Euclidian distance, 


scaled Euclidian distance, and Mahalanobis distance. Assum- 


ing normal populations, NCO Ds), with equal covariance 


185, Cit., Anderberg, M.R., p. 173-176 
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matrices =, = 2 = ... = 2 so that the populations differ 


at eZ 
only in location, the Mahalanobis distance between the 
populations is given by D° = (8, - a4)" ET(a, = @,). This 


was the distance measure used in this cluster analysis. at) 


The question of how many clusters are present in the 
data was mentioned in the previous section. It can be shown 
that one prime indicator of the discriminability of variables 
in the data set is given by the log of the ratio det T/det W. 
When this quantity is plotted against the number of clusters 
one can gain insight as to the appropriate number of clusters 
within the data set. As the number of clusters is increased 
the ratio begins to reach a stabilizing value indicating 
that the discriminability of the data is decreasing. Thus, 
one can approximate the maximum number of natural clusters 
by observing when the curve levels off. It should be 
reemphasized that it is a primary objective of most cluster 
analysis problems to produce a set of clusters that are well 
differentiated from each other. 

As stated before, when cluster analyses are performed 
on data with several variables actually measuring the same 
characteristic, it might be profitable to reduce the problem 
to one of only a few primary variables by the techniques of 


principal components analysis. 


“Oop. Cit., Press, S.J. p; 372-323 


33 





For this analysis, the computer program BIMEDOIM was 
utilized to extract the first principal component from the 
first two variables and the first principal component from 
the fifth, sixth, seventh and eighth variables. BIMEDOIM 
performs the following four basic steps: (1) the data are 
normed and centered; (2) the correlation matrix of the 
centered and normed data is computed; (3) the eigenvalues 
and corresponding eigenvectors of the correlation matrix 
are calculated; and (4) the centered and normed data are 


transformed into their orthogonal components. °~ 


élomp Manual, Biomedical Computer Programs, Health 
Sciences Computing Facility, UCLA, University of California 
Press, 1973, p. 193-201 
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VI. RESULTS OF ANALYSIS 


The two data groups, control and accident, were first 
investigated by discriminant analysis with the use of the 
computer program BIMEDOUMM. 

The test for equality of group covariance matrices (or 
equivalently, group dispersion matrices) was performed 
according to the procedure developed by G. E. P. Box and 
illustrated in Appendix D. They were found to be equal at: 
the .10 level of significance so it was appropriate to apply 
the discriminant analysis procedures. 

Testing for the equality of group means, BIMEDO4MM 
computed an F statistic of 8.94. For the a = .001 level of 
significance, the tabled F value is Paytno-m-1 62 - ® = 
ae Ae: - .001) = Faq (-999) = 3.39 and one can conclude 
that there is definitely a difference in location of group 
means. 

The computed discriminant function coefficients were 
(-0.00152, 0.00001, -0.00035, 0.00360, -0.00988, 0.00685, 
0.00218, -0.00191, -0.00245, 0.00231). If after applying 
the coefficients to a skeiee Ugo VeClOr A. 


J 


jl a5 0.00001x,, Ey ae 3 0.00231X5 19 < 0 then 


data unit j is assigned to group number two. Otherwise, 


-0.00151x 
the data unit is assigned to group number one. 


Those subjects who had high values for the variables 


with positive coefficients and low values for the variables 
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with negative coefficients were classified as being in the 
control group, and those with opposite attributes were 
wlassiticdsaSsmoclongeing Go thesaccident group. ; 

The discriminant function was applied to the original 
data units to determine the performance of the function. 
Fifteen subjects of the fifty in the control group were 
classified as being in the accident group, while only four 
| of the fifty in the accident group were classified as being 
En the control group. It is important to observe that 
although the overall misclassification rate is nineteen 
percent, the misclassification rate of the original accident 
group is only eight percent. This is encouraging. The 
question of identifying correctly those in the accident 
Eroup sion ol @reater ;@oOncern than that of misclassifying 
Ghose andividuealS in the control group. 

To obtain the preceding results, it should be noted 
that the discriminant analysis was performed on the raw data 
as listed in Appendix A. An analysis was also performed on 
the standardized data, listed in Appendix B, but the results 
were much poorer. Using standardized data, the overall 
misclassification rate was fifty-five percent, quite a loss 
of discriminating power. It should be recognized that 
standardizing data has the drawback of providing answers to 


a problem different than the one originally posed.°* 


ee Op. Cieeeeeressamo.d.. pe 416) 
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inmeaddivien tO Nearningythe misclassification rates, it 
was also desired to determine which variables had the 
strongest effect on classifying the data units and in which 
Pr rectron thesetfiect was observed. The discriminant function 
coefficients indicate whether each variable has a positive 
or negative effect, but because of the difference in magni- 
tudes of the variables, the discriminant function coefficients 
alone do not tell how much of an effect. It is of interest, 
therefore, to compare how much a one standard deviation 
change in each variable will affect the discriminant function. 
Table I presents the standard deviations of each variable in 
the second column, the discriminant function coefficients in 
the third column, and in the last column the effect on the 


discriminant function of a one-sigma change in each variable. 


TABLE I 


Variable Standard Datei, Jo bhe¥ene Effect of a 
Deviation Coefficient lo Change 


1 
2 
3 
4 
5 
6 
i 
8 
9 
0 


bod 
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The results of Table I indicate that variable two (total 
hours) has the strongest positive effect in classifying a 
subject as not being in the accident group. A surprising 
result, however, is that variable five (time flown in the 
last twenty-four hours) has the strongest negative effect 
while variable six (time flown in the last forty-eight hours) 
has a strong positive effect. This would suggest that flying 
every other day is beneficial, but that too much flying (i.e. 
everyday) is detrimental. Similar interpretations can be 
made for the remaining variables although their effects are 
less pronounced. 

Although an overall misclassification rate of nineteen 
percent tends to indicate that there are meaningful differ- 
ences between the two groups, the classification capabilities 
of the discriminant function are not as sharp as one would 
like. One cannot say with aSSurance how a pilot not 
initially a member of either group should be classified. 

It was desired to learn more about the variables! effects 

to be able to apply conclusions to subjects beyond the range 
of the data. To do this, a second method of analysis was 
employed; that of cluster analysis. 

The first type of cluster analysis used was hierarchical 
clustering by data units. The computer program HI-CLUST was 
used with Euclidean distance measure between data units as 
the indicator of association. The results from both the 
Single linkage and complete linkage methods were not at 


all satisfactory. When clustered irito the final two groups, 
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one cluster consisted of ninety-nine units and the other of 

a single unit. The cluster of ninety-nine units was composed 
of clusters of ninety-three units and six units; again 
Shedding no light on relation to accidents. Therefore, the 
clustering on data units was reworked using the non- 
hierarchical techniques of the computer program K—-MEANS. 

Initially, the one hundred data units were clustered 
into two groups to ascertain if there was any association 
directly with the two original groups, control and accident. 
Unfortunately, there did not appear to be any association, 
as cluster number one contained thirty-three subjects from 
the control group and thirty-seven from the accident group 
while cluster number two had seventeen and thirteen, 
respectively. 

Figure (3) graphically depicts the cluster means of the 
two-group cluster results, and the number of subjects in the 
clusters. It is interesting to note that fifty-three percent 
of cluster number one was composed of subjects from the 
accident group while only forty-three percent of cluster 
number two was from the accident group. By inspecting the 
cluster means of variables one and two, one can see that the 
cluster compositions are inversely related to total experience, 
i.e. cluster number one has higher accident composition and 
fewer years deSignated naval avaiator and fewer total hours. 
The same kind of relation is seen to apply to the recency and 
frequency variables (variables four through ten). but the 


separation is not as great. Cluster number one which has the 
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higher accident composition has cluster means which indicate 
more recent and frequent flying than the subjects of cluster 
number two. Again, the results here are in basic agreement 
with those of discriminant analysis in that they indicate 
less frequent flying is beneficial. But of course, the 
Support is very thin and the results are far from conclusive, 
especially since the cluster means are seen to be relatively 
close for all of variables four through ten. 

jie was stated in Section V that it is possible to get 
a rough idea of the number of natural clusters present in 
the data by plotting log(det T / det W) versus the number of 
clusters. Figure (4) is a plot of this information for the 
data under study. As the number of groups is increased the 
curve begins to level off. It appears that beyond nine groups 
there is not much additional information to be gained by 
grouping further. 

The primary interest lies in the analysis of two groups, 
since there were two groups initially, and in the analysis 
of the natural number of groups. Between two and nine groups 
the results are believed to be less moc ae 

Figure (5) graphically portrays the cluster means of the 
Demme lus ter eresulius and the numbersort data units’ Inveach 
cluster. The relationships among clusters here are not 
apparent and there is no one-to-one correspondence such as 
an inverse relation between the cluster means of total hours 


flown and composition of clusters by accident percentages. 
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log (det T / det W) 


Number of 
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Figure (4) 


It is desirable, therefore, to plot the proportion of each 
eiusver from the accident gsroup versus the cluster means 

for each variable. By so doing, trends might appear and 
factors influencing the accident proportions might become 
more readily observable. These plots are depicted in Appendix 
E as Figures (10) through (19). Figures (10) through (19) 
are similar in that none of them reveal any prominent 
relationships that thelr respective variables have with the 
proportion of the clusters composed of accident subjects. 
Intuitively, one might have’ hypothesized that as the cluster 
means increased (as in Fig. (11) for instance) that the 
proportion of the clusters composed of accident units would 
decrease. Since this kind of relationship did not appear 


peor eies LOUal hours variabilie@, mor did similarly anticipated — 
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relations hold for the other variables, a final type of 
analysis was performed on the data. 

It seemed plausible that although the variables individ- 
ually did not reflect the contributions they had upon 
accidents, certain variables collectively might demonstrate 
such an effect. To determine which variables to combine, 2a 
hierarchical clustering analysis was performed by the computer 
program HI-CLUST. Product-moment correlation was used as 
the association measure between variables. Both the methods 
of single linkage and complete linkage clustering as discussed 
in Section V were employed. The results are shown as 
hierarchical trees (dendrograms) in Figures (6) and (7). 

The results of both hierarchical methods are similar. 
Variables number one and two are highly correlated and 
variables five, six, seven and eight are highly correlated. 
Therefore, it was decided to combine those respective variables, 
calling the first the experience variable and the second the 
frequency variable. In order to eliminate all unnecessary or 
distracting influences it was also considered prudent to 
eliminate variables nine and ten since ASG) AOS accidents 
involved carrier landings and many subjects in both groups 
were not involved in carrier operations during the period 
investigated. 

As discussed in Sections IV and V, BIMEDO1M was used to 
extract the first principal components from those combina- 


tions of variables listed above to obtain the total 
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experience variable and the frequency variable. The principal 
components (exhibited in Appendix G) were extracted from the 
Standardized data (exhibited in Appendix B) as required by 
BIMEDOIM. 

With the data reduced to four main variables, the cluster 
analysis program K-MEANS was again used to investigate the 
data. As was done with the data in ten variables (or 
regular space) the data was First investigated by clustering 
into just two groups. The cluster means are depicted in 
Figure (8). As was true in the regular space analysis, there 
does not appear to be any association between clustering of 
data units and membership in the accident group. Again, 
there were thirty-three subjects from the control group and 
thirty-seven from the accident group in cluster number one, 
and seventeen and thirteen respectively in cluster number 
BWO. = lnus, fifty-three percent of cluster number one was 
from the accident group and forty-three percent of cluster 
number two was from the accident group. 

The graph of log (det T / det W) versus number of 
elusters was plotted for the reduced space analysis alg 
Figure (9) and was also-found to indicate that beyond nine 
Sievers, Minimal neornattememas) Saltiedaas Inerefore, a pilov 
of the proportion of’ clusters from the accident group versus 
the cluster means was constructed for each of the four 
variables in the reduced space with nine cluster groupings. 
Figures (20) through (23) in Appendix H are graphs of the 


results. 
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The resultant plots in Appendix H are not too informative. 
Three of the four "new" variables do not appear to reveal any 
structure; but the first, the experience variable, may have 
some interest. A parabolic fit has been drawn in freehand 
and the accident rate seems to bottom out for experience in 
the interval (-1,0). This is misleading however. The 
interval (-1,0) of the experience variable corresponds to 
values of X, and X, which are between modes of their respec- 
tive distributions. Only seven of the one~hundred aviators 


are in this range. 
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VII. CONCLUSIONS 


The three analytical methodologies employed io A eels 
investigation were primarily utilized as exploratory tools 
to determine if there were significant differences in the 
various flight time statistics recorded for sample groups 
Of pilots with and without accidents. The discriminant 
analysis techniques provided the best indication that there 
were differences which could be used to categorize the 
pilots according to the probability of belonging to the 
accident group. 

It should be recognized that failure to distinguish 
pmCn eee. ocs according to their flight statistic attributes 
is not necessarily a fault of the analytical procedures, but 
inherent inability of the data as currently conceived to 
discriminate among subjects. This does not suggest, however, 
that this approach to accident analysis hassno merit. “L6 
does point out the need to expand the investigation to in- 
clude more quantitative aspects of flying. Many other 
variables such as instrument time, synthetic trainer time, 
number of instrument approaches, average time spent briefing 
flights, and subjective attributes such as training command 
flight grades and NATOPS quiz grades could be included. 
Breaking the investigation down into many more restrictive 
areas such as ineluding only accidents in a particular phase 


ieee Oren cliding Only accidents by a particular type 
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of aircraft such as attack or patrol might also prove to 

be more relevant. It should be informative to expand the . 
time span of the data base to include five or ten years so 
as to have a larger sample size on which to base results. 
Also, enlarging the size of the control group would help 

to eliminate the effects of non-randomness which could bias 
the dava. 

Despite the fact that the data investigated in this 
analysis did not contain those characteristics which could 
identify the underlying accident generating mechanism, it 
is still considered worthwhile to pursue the basic ideas 


developed here in future accident analysis. 
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APPENDIX B 


TABLE IV — Control Group Standardized Data 
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TABLE V — Accident Group Standardized Data 
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APPENDIX C 


DISPERSION MATRICES 


An element of the group dispersion matrix is given by 


the formula 





macreeie=-Bie 2a. ..,.0 and j =’T,2,...,10. An*element of 


the dispersion matrix is readily seen to differ from an 





element of the covariance matrix only by the factcr N : T 
where N is the number of data units or observations. 

An element of the pooled within groups dispersion matrix 
is given by the formula 


ze po 


] — 
a ————— FF LF (x. =X, ) (Xo 5 Xe 
Ny . Ny = 2 f=] bel ikl if il 


Wie@cw ioe Vif. 10 and je=Ol1,2,...,l10,8and Ny and Ny 


are the number of observations of the respective groups. 
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TABLE VI — Control Group Dispersion Matrix 


Variable 
ai 2 3 + zy 
1 ee 7700-41 ~25.90 Onval 0.58 
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9 0.77 Oe27 0.34 3.10 ia 
10 0-32 0.07 0.09 


1e11 Oeo7 
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TABLE VII — Accident Group Dispersion Matrix 
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TABLE VIII — Pooled Within Groups Dispersion Matrix 


Variable 
1 2 3 yy 5 
1 B7.05 7238236 039 =28..78 1807 1 O.00 
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APPENDIX D 


SCAT STICALDSTEST HOR EQUALITY OF DISPERSION MATRICES 


Given a sample of two groups and m variables with group 


dispersion matrices S. and S55 pooled within groups disper- 


1 
LOM matrix Sy. and total sample observations N = Ns a N55 


the hypothesis that the dispersion matrices (and thus the 
covariance matrices) are statistically equal may be deter- 


mined by the following computations: ve 





A= InE|8,]] - (N-2) - (N)-1)-1nf]8,|] - (N,-1)-1n[]8,]1 
a il 1 2 
0, iN, N-2!~ tem ~ 3m - 2) 
| 
6(2 - 1)(m + 1) 
—+— + ——s - —s] (m - 1)(m + 2) 
(N, +1) (N,t1) (N-2) 
C = 
6 
D m(m + 1) 
2 
ae yenee: 
abs |B° - C| 


ee Box, G.E.P., "A General Distribution Theory for a Class 
of Likelihood Criteria," Biometrika 36 (1949), p. 317-346 
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If 3c Perea ve YT mellalmwn cne Less Statistic is: 


E A(l - + 2/ ir) FD 
te tt oe - 


If C is greater than Be then the test statistic is: 


D 


(A) - (1 -B-2) ~F 
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APPENDIX E 


Figures (10) through (19) depict cluster analysis plots 
for each of the ten variables being studied. The label "P" 
ChemocmvCCELGal fixisSeol cachefiguresrepreisents the proportion 


of each cluster that originates from the accident group. 


ap 
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Figure (10) 
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Last 90 Days 
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Figure (12) 
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Figure (13) 
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Figure (14) 
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Figure (15) 
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Figure (16) 
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Last 48 Hours 
O Ut ats 1.2 1.6 2.0 2.4 (Cluster Means) 





Figure (17) 
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Figure (18) 
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APPENDIX F 


CORRELATION MATRIX 


The product-moment correlation between two variables, 
_ Cov(X, ,X,) 
(x) (X. 
Var X, Var ee 


of the correlation matrix can be calculated by the equation 


Aweand xX An element 


4 5 tS eflven by P a5 


nN 
; a (Ky — Xq) Ky, - 5) 
i > ere /2 
[ 2 (X,,-X,)° 2 (X51. -X5) 7] 


TABLE VIII -— Correlation Matrix for all Data 


Variable 
1 2 2 Uy 5 
0.8484 1.0000 =0.0961 0.0539 -0.0288 
BOR OGOmmeOrO001 ©0000 —O8H616" 0.3112 
0.1161 0.0539 -0.4616 1.0000 -~0.3818 





Varies OsO276 90.0288 0.3131 -0.38138 1.0000 
61-0.0497 =0.0558 023516 -0.4539 0.8682 
7i=0.0710 =-0.0017 0.2644 -0.5147 #£0.7092 
&9 =~. 14-36 Oe ligt Or 5284 = () 029 Ds 0.1851 

1LO}-0.0852 -0.0577 0.4864 -0.1973 0.1932 
6 7 8 9 10 
~e0497 @~O.0710 ~0.1163 ~0.1436 -0.0852 
-O,. 0558 ~0.0817 Ole dere Oe 1164 =-9.0577 
0.3516 0.2644 0.3308 0.528% 0.4864 
=O. i 39 -O. 57 Ben 5 lieu -O ee) oe -1,1973 
Variable 0.7092 025596 0.1851 0.1932 


OOOO MEO. /O0 MO. /99) sO. 1520 96.1370 
On > ome OCOO0 NO. 0499 0.0636, 0.0251 
Oe eo Oe 199m. 0CCO) 90.1726) OF0994 
O21 520 mOsO0 360m O. 1726 9) 450000 — 0.8169 
Cel Ome e025 © O99 Oo. 0169), 1.0000 
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APPENDIX G 


TABLE IX — Principal Components for Control] Group* 











Variables | Variables Karl abiecmiey ies Los 
OBS. | iw 2 EO ee Obs. | 1c 2 mols 75 (omens 
| 21.3798 26 0.5332 | -7.0333 | 
6.3810 27 Ge le25 [65 sal 
} ~0.1488 28 0.862 1.4137 
1 wl L194 20 Oo2341 | -O.62317 
1.9568 30 @nO> 21 Soe .T 
Eres tore ea Be i /95 Ge2e0 
=-2.6300 3 0.9230 063750 
0.2495 3 Reee200s =-0.58834 
~1.3724 35 ~1 6680 a caks, 
0, 3248 36 SOs ug ~1.7726 
1.8672 37 1 LOU Aig PE. 
0.8991 38 <1 oes! SO er nl 
15973 39 0.2469 e730 tou 
0.9013 LQ -0.9079 ~Oo1941 
ie 2 LT “0.0550 ! 3.973% 
0.4627 ho Oo YRBE “hey eG eye! 
0.9433 1.2 0.9838 oe 
| 0.0804 Lid al, < ILStrs -1.5082 
| 0.2406 L5 1.1599: ~0 67275 
ee elote, 46 0.6936 0.2885 ! 
3 3.60 0.7114 48 Os isgsll 0.5545 
24 | 4.4635 | -0.2801 49 | -2.1592 1.5234 
Pees 93 50 ~2.3989 
*Note: The principal components were extracted from 


standardized data. 
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TABLE X — Principal Components for Accident Group* 


Variables | Variables Variables| Variables 
Obs.- |} 1&2 Fy leh Feegeee (8 Gipson Lac 2 DV yews 
i 1.3880 26.09 26 , 68 
> | noruer | ~olbhoe 27 | oleear | . oto 
e399 1 =0. 7090 28 OGRA Sart 
t 0.8553 -0.0562 ao #129340 m2. 3258 
2 | Ee | EBS |B | eee | eee 
7 a O ; e a5 £2 
i ~1.3378 123637 32 Re 01973 
8 0.7425 =1.8509 86 =-2 9702 m2 6 43258 
9. | -2,9242 0.0391 St 0.2316 32,9012 
10 -0.6633 2 3258 32 1.5540 | 0.9173 
1 | otzeum | otpose = |} 32 «| otesee | “ar bs6e 
12 . e eV | eo 
ie 0.0578 1.9759 38 “1.6751 0.1790 
14 oe 0.0242 Pe ~1.8h A oily, 
is | te ? 0.8463 He ©.8336m lt 822-3258 
16 1.102 2epaker i O. Ane -0.9173 
17 0.7309 Sey ew sts: ve 0.30 1.5398 
18 1147 0.3751 We “301288 -9 2836 
19 0.7625 -0.4883 0.28 ~O. 023 | 
20 eae OM 7367 1) 1.1647 | -1.3485 | 
3 4 “6 2522 2,0427 | 
2 2.7147 | =1.0970 We 0.292 (a% 
22 0.6060 3.0390 i eee eee 
, wae Gees: 49 oraa25 | ~0Lk909 
0.0150 | -1.5083 50 L791 | 1.3871 
*Note: The principal merponents | were extracted from 


standardized data. 
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APPENDIX H 


Experience 
(Cluster Means) 





Figure (20) 
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Figure (21) 
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Figure (22) 
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Figure (23) 
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